1 Introduction

RMS Titanic was a British passenger liner that sank in the North Atlantic Ocean in the early morning hours of April 15, 1912, after striking an iceberg during her maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, more than 1,500 died (including 815 of its passengers), making the sinking one of modern history’s deadliest peacetime commercial marine disasters.

2 Passengers’ demographic

The Titanic’s passengers were divided into three separate classes determined by the price of their ticket: those travelling in first class, most of them the wealthiest passengers on board, included prominent members of the upper class, businessmen, politicians, high-ranking military personnel, industrialists, bankers, entertainers, socialites, and professional athletes. Second-class passengers were predominantly middle-class travellers and included professors, authors, clergymen, and tourists. Third-class or steerage passengers were primarily emigrants moving to the United States and Canada.

Titanic’s passengers numbered 1,317 people: 324 in first class, 285 in second class, and 708 in third class. Of these, 1680 were male and 434 were female; 112 children were aboard, the largest number of which were in third class. The ship was considerably under capacity on her maiden voyage, as she could accommodate 2,453 passengers—833 first class, 614 second class, and 1,006 third class.

getwd()
## [1] "/Users/valencialie/Desktop"
setwd("/Users/valencialie/Desktop")
passenger <- read.csv (file = "train.csv")
head(passenger)
##   PassengerId Survived Pclass
## 1           1        0      3
## 2           2        1      1
## 3           3        1      3
## 4           4        1      1
## 5           5        0      3
## 6           6        0      3
##                                                  Name    Sex Age SibSp Parch
## 1                             Braund, Mr. Owen Harris   male  22     1     0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38     1     0
## 3                              Heikkinen, Miss. Laina female  26     0     0
## 4        Futrelle, Mrs. Jacques Heath (Lily May Peel) female  35     1     0
## 5                            Allen, Mr. William Henry   male  35     0     0
## 6                                    Moran, Mr. James   male  NA     0     0
##             Ticket    Fare Cabin Embarked
## 1        A/5 21171  7.2500              S
## 2         PC 17599 71.2833   C85        C
## 3 STON/O2. 3101282  7.9250              S
## 4           113803 53.1000  C123        S
## 5           373450  8.0500              S
## 6           330877  8.4583              Q
str(passenger)
## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
##  $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
##  $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...

2.1 Data cleansing

passenger$Name <- as.character(passenger$Name)
class(passenger$Name)
## [1] "character"
passenger$Ticket <- as.character(passenger$Ticket)
class(passenger$Ticket)
## [1] "character"
passenger$Cabin <- as.character(passenger$Cabin)
class(passenger$Cabin)
## [1] "character"

2.1.1 Missing Data

colSums(is.na(passenger))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0

Although 177 is less than half of the total amount of data, the information on the passengers’ age is essential and crucial in determining their likelihood of surviving the shipwreck. Thus, we will fill the missing values with the median age of all passengers.

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
passenger_new <- preProcess(passenger, method = c("medianImpute"))
passenger_new <- predict(passenger_new, passenger)
head(passenger_new)
##   PassengerId Survived Pclass
## 1           1        0      3
## 2           2        1      1
## 3           3        1      3
## 4           4        1      1
## 5           5        0      3
## 6           6        0      3
##                                                  Name    Sex Age SibSp Parch
## 1                             Braund, Mr. Owen Harris   male  22     1     0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38     1     0
## 3                              Heikkinen, Miss. Laina female  26     0     0
## 4        Futrelle, Mrs. Jacques Heath (Lily May Peel) female  35     1     0
## 5                            Allen, Mr. William Henry   male  35     0     0
## 6                                    Moran, Mr. James   male  28     0     0
##             Ticket    Fare Cabin Embarked
## 1        A/5 21171  7.2500              S
## 2         PC 17599 71.2833   C85        C
## 3 STON/O2. 3101282  7.9250              S
## 4           113803 53.1000  C123        S
## 5           373450  8.0500              S
## 6           330877  8.4583              Q

Because the data on most of the Cabin column is missing and it is unlikely for us to acquire important insight from it, we will delete the column Cabin.

passenger_newnew <- passenger_new[,-11]
head(passenger_newnew)
##   PassengerId Survived Pclass
## 1           1        0      3
## 2           2        1      1
## 3           3        1      3
## 4           4        1      1
## 5           5        0      3
## 6           6        0      3
##                                                  Name    Sex Age SibSp Parch
## 1                             Braund, Mr. Owen Harris   male  22     1     0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38     1     0
## 3                              Heikkinen, Miss. Laina female  26     0     0
## 4        Futrelle, Mrs. Jacques Heath (Lily May Peel) female  35     1     0
## 5                            Allen, Mr. William Henry   male  35     0     0
## 6                                    Moran, Mr. James   male  28     0     0
##             Ticket    Fare Embarked
## 1        A/5 21171  7.2500        S
## 2         PC 17599 71.2833        C
## 3 STON/O2. 3101282  7.9250        S
## 4           113803 53.1000        S
## 5           373450  8.0500        S
## 6           330877  8.4583        Q

2.2 Subsetting data

2.2.1 Gender and age

In order to find insight on the survivality of these passengers, we separate the survivors and those who did not survive the shipwreck.

survivors <- passenger_newnew[passenger_newnew$Survived == 1,]
head(survivors)
##    PassengerId Survived Pclass
## 2            2        1      1
## 3            3        1      3
## 4            4        1      1
## 9            9        1      3
## 10          10        1      2
## 11          11        1      3
##                                                   Name    Sex Age SibSp Parch
## 2  Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38     1     0
## 3                               Heikkinen, Miss. Laina female  26     0     0
## 4         Futrelle, Mrs. Jacques Heath (Lily May Peel) female  35     1     0
## 9    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female  27     0     2
## 10                 Nasser, Mrs. Nicholas (Adele Achem) female  14     1     0
## 11                     Sandstrom, Miss. Marguerite Rut female   4     1     1
##              Ticket    Fare Embarked
## 2          PC 17599 71.2833        C
## 3  STON/O2. 3101282  7.9250        S
## 4            113803 53.1000        S
## 9            347742 11.1333        S
## 10           237736 30.0708        C
## 11          PP 9549 16.7000        S
notsurvive <- passenger_newnew[passenger_newnew$Survived == 0,]

head(notsurvive)
##    PassengerId Survived Pclass                           Name  Sex Age SibSp
## 1            1        0      3        Braund, Mr. Owen Harris male  22     1
## 5            5        0      3       Allen, Mr. William Henry male  35     0
## 6            6        0      3               Moran, Mr. James male  28     0
## 7            7        0      1        McCarthy, Mr. Timothy J male  54     0
## 8            8        0      3 Palsson, Master. Gosta Leonard male   2     3
## 13          13        0      3 Saundercock, Mr. William Henry male  20     0
##    Parch    Ticket    Fare Embarked
## 1      0 A/5 21171  7.2500        S
## 5      0    373450  8.0500        S
## 6      0    330877  8.4583        Q
## 7      0     17463 51.8625        S
## 8      1    349909 21.0750        S
## 13     0 A/5. 2151  8.0500        S
table(notsurvive$Sex)
## 
## female   male 
##     81    468
table(survivors$Sex)
## 
## female   male 
##    233    109

From this table alone, we can tell that the ratio of a woman surviving the ship wreck to a man is about 2.14:1, while the ratio of a man not surviving to a woman not surviving is 5.78:1.

This ratio is logical because during the shipwreck, women and children were given priorities to the life boats, which boost their chance of surviving the shipwreck, compared to men.

Let’s take a look on how age determines the survival of women.

agg1 <- as.data.frame(prop.table(table(survivors$Sex == "female", survivors$Age)))
agg2 <- agg1[agg1$Var1 == TRUE,]
agg6 <- agg2[order(agg2$Freq, decreasing = T),]
head(agg6)
##    Var1 Var2       Freq
## 64 TRUE   28 0.11988304
## 56 TRUE   24 0.04093567
## 52 TRUE   22 0.02923977
## 68 TRUE   30 0.02631579
## 44 TRUE   18 0.02339181
## 80 TRUE   35 0.02339181
library(ggplot2)

ggplot(data = agg2, mapping = aes(x = Var2, y = Freq)) +
geom_col(aes(fill = Freq)) +
  scale_fill_viridis_c() +
  labs(x= "Age", y= "Probability of survival", title = "Probability of survival for women based on age", fill = "Probability") +
  coord_flip()

From the above data, we can tell that among the women survivors, those of age 28 has the highest probability of survival with 0.119883041.

On the other spectrum, let’s examine how age determines the survival of men from the shipwreck.

agg3 <- as.data.frame(prop.table(table(survivors$Sex == "male", survivors$Age)))
agg4 <- agg3[agg3$Var1 == TRUE,]
head(agg4)
##    Var1 Var2        Freq
## 2  TRUE 0.42 0.002923977
## 4  TRUE 0.67 0.002923977
## 6  TRUE 0.75 0.000000000
## 8  TRUE 0.83 0.005847953
## 10 TRUE 0.92 0.002923977
## 12 TRUE    1 0.008771930
agg5 <- agg4[order(agg4$Freq, decreasing = T),]
head(agg5)
##    Var1 Var2       Freq
## 64 TRUE   28 0.05263158
## 72 TRUE   32 0.02046784
## 62 TRUE   27 0.01754386
## 16 TRUE    3 0.01169591
## 58 TRUE   25 0.01169591
## 82 TRUE   36 0.01169591
ggplot(data = agg4, mapping = aes(x = Var2, y = Freq)) +
geom_col(aes(fill = Freq)) +
  scale_fill_viridis_c() +
  labs(x= "Age", y= "Probability of survival for men", title = "Probability of survival based on age", fill = "Probability") +
  coord_flip()

From the above data, we can tell that among the men survivors, those of age 28 has the highest probability of survival with 0.052631579.

Thus, with this we can conclude that those of age 28, regardless of gender, are most likely to survive the shipwreck.

There is a logical reasoning behind the number 28. 25-30 is said by scientists to be the age range in which the human body is considered to be the fittest and strongest. This makes the deduction that passengers of age 28 to have the best chance of survival a sound one.

On the other hand,

s1 <- as.data.frame(prop.table(table(notsurvive$Sex == "female", notsurvive$Age)))
s2 <- s1[s1$Var1 == TRUE,]
s6 <- s2[order(s2$Freq, decreasing = T),]
head(s6)
##    Var1 Var2        Freq
## 58 TRUE   28 0.034608379
## 32 TRUE   18 0.009107468
## 4  TRUE    2 0.007285974
## 16 TRUE    9 0.007285974
## 40 TRUE   21 0.005464481
## 52 TRUE   25 0.005464481
library(ggplot2)

ggplot(data = s2, mapping = aes(x = Var2, y = Freq)) +
geom_col(aes(fill = Freq)) +
  scale_fill_viridis_c() +
  labs(x= "Age", y= "Probability of non survival", title = "Probability of non survival for women based on age", fill = "Probability") +
  coord_flip()

From the above data, we can tell that among the women who did not survive, those of age 28 have the highest non survival probability of 0.034608379.

For men,

s3 <- as.data.frame(prop.table(table(notsurvive$Sex == "male", notsurvive$Age)))
s4 <- s3[s3$Var1 == TRUE,]
head(s4)
##    Var1 Var2        Freq
## 2  TRUE    1 0.003642987
## 4  TRUE    2 0.005464481
## 6  TRUE    3 0.000000000
## 8  TRUE    4 0.005464481
## 10 TRUE    6 0.000000000
## 12 TRUE    7 0.003642987
s5 <- s4[order(s4$Freq, decreasing = T),]
head(s5)
##    Var1 Var2       Freq
## 58 TRUE   28 0.22586521
## 34 TRUE   19 0.02914390
## 40 TRUE   21 0.02914390
## 42 TRUE   22 0.02550091
## 52 TRUE   25 0.02550091
## 48 TRUE   24 0.02367942
library(ggplot2)

ggplot(data = s4, mapping = aes(x = Var2, y = Freq)) +
geom_col(aes(fill = Freq)) +
  scale_fill_viridis_c() +
  labs(x= "Age", y= "Probability of non survival", title = "Probability of non survival for men based on age", fill = "Probability") +
  coord_flip()

Similarly, we can tell that among the men who did not survive, those of age 28 have the highest non survival probability of 0.225865209.

This piece of information may seem contradicting to what we just assumed earlier as apparently the age 28 has the highest probability of survival and non-survival for both men and women.

2.2.2 Gender and Passenger class

Next, we will try to find which class gives passengers the best chance of survival.

table(survivors$Pclass, survivors$Sex)
##    
##     female male
##   1     91   45
##   2     70   17
##   3     72   47

From the data above, we can tell that first class passengers have the best chance of survival as 39% of women survivors are first class passengers while 41% of men survivors are first class passengers. Together, first class passengers make up almost 40% of the entire number of survivors.

Second class passengers have the least chance of survival as only 30% of women survivors are second class passengers and 16% of men survivors are second class passengers. Together, second class passengers only make up about 25% of the entire number of survivors.

Logically speaking, we would expect first class passengers to have greater priority to board the life boat, followed by second class passengers and lastly, third class passengers. However, this is not the case as second place passengers have the least chance of survival.

According to sources, this is possible because when Captain Smith ordered his officers to put the “women and children in and lower away”, his 2 officers, Murdoch and Lightoller, interpreted the evacuation order differently: Murdoch as women and children first, while Lightoller as women and children only. Lightoller lowered lifeboats with empty seats if there were not any women and children waiting to board, while Murdoch only allowed a limited number of men to board if all the nearby women and children had already embarked. This had a significant effect on the survival rates of the men aboard Titanic, whose chances of survival came to depend on which side of the ship they tried to find lifeboat seats.

Thus, it is likely that most of the second class men tried to enter lifeboats guarded by Lightoller, causing their chance of survival to decrease tremendously, hence accounting for second class passengers’ least chance of survival.

3 Prediction

Using the insights we gained, we will try to predict whether these passengers survive the shipwreck.

test <- read.csv(file = "test.csv")
head(test)
##   PassengerId Pclass                                         Name    Sex  Age
## 1         892      3                             Kelly, Mr. James   male 34.5
## 2         893      3             Wilkes, Mrs. James (Ellen Needs) female 47.0
## 3         894      2                    Myles, Mr. Thomas Francis   male 62.0
## 4         895      3                             Wirz, Mr. Albert   male 27.0
## 5         896      3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0
## 6         897      3                   Svensson, Mr. Johan Cervin   male 14.0
##   SibSp Parch  Ticket    Fare Cabin Embarked
## 1     0     0  330911  7.8292              Q
## 2     1     0  363272  7.0000              S
## 3     0     0  240276  9.6875              Q
## 4     0     0  315154  8.6625              S
## 5     1     1 3101298 12.2875              S
## 6     0     0    7538  9.2250              S

3.1 Data cleansing

str(test)
## 'data.frame':    418 obs. of  11 variables:
##  $ PassengerId: int  892 893 894 895 896 897 898 899 900 901 ...
##  $ Pclass     : int  3 3 2 3 3 3 3 2 3 3 ...
##  $ Name       : Factor w/ 418 levels "Abbott, Master. Eugene Joseph",..: 210 409 273 414 182 370 85 58 5 104 ...
##  $ Sex        : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 1 2 1 2 ...
##  $ Age        : num  34.5 47 62 27 22 14 30 26 18 21 ...
##  $ SibSp      : int  0 1 0 0 1 0 0 1 0 2 ...
##  $ Parch      : int  0 0 0 0 1 0 0 1 0 0 ...
##  $ Ticket     : Factor w/ 363 levels "110469","110489",..: 153 222 74 148 139 262 159 85 101 270 ...
##  $ Fare       : num  7.83 7 9.69 8.66 12.29 ...
##  $ Cabin      : Factor w/ 77 levels "","A11","A18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Embarked   : Factor w/ 3 levels "C","Q","S": 2 3 2 3 3 3 2 3 1 3 ...
test$Name <- as.character(test$Name)
test$Ticket <- as.character(test$Ticket)
test$Cabin <- as.character(test$Cabin)

3.1.1 Missing Data

colSums(is.na(test))
## PassengerId      Pclass        Name         Sex         Age       SibSp 
##           0           0           0           0          86           0 
##       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           1           0           0

Because age is important, we would have to compute the median to fill in the missing data

library(caret)
test_new <- preProcess(test, method = c("medianImpute"))
test_new <- predict(test_new, test)
head(test_new)
##   PassengerId Pclass                                         Name    Sex  Age
## 1         892      3                             Kelly, Mr. James   male 34.5
## 2         893      3             Wilkes, Mrs. James (Ellen Needs) female 47.0
## 3         894      2                    Myles, Mr. Thomas Francis   male 62.0
## 4         895      3                             Wirz, Mr. Albert   male 27.0
## 5         896      3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0
## 6         897      3                   Svensson, Mr. Johan Cervin   male 14.0
##   SibSp Parch  Ticket    Fare Cabin Embarked
## 1     0     0  330911  7.8292              Q
## 2     1     0  363272  7.0000              S
## 3     0     0  240276  9.6875              Q
## 4     0     0  315154  8.6625              S
## 5     1     1 3101298 12.2875              S
## 6     0     0    7538  9.2250              S

For Cabin, we would delete the whole column because it is not meaningful and it has way too many missing data.

test_newnew <- test_new[,-10]
head(test_newnew)
##   PassengerId Pclass                                         Name    Sex  Age
## 1         892      3                             Kelly, Mr. James   male 34.5
## 2         893      3             Wilkes, Mrs. James (Ellen Needs) female 47.0
## 3         894      2                    Myles, Mr. Thomas Francis   male 62.0
## 4         895      3                             Wirz, Mr. Albert   male 27.0
## 5         896      3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0
## 6         897      3                   Svensson, Mr. Johan Cervin   male 14.0
##   SibSp Parch  Ticket    Fare Embarked
## 1     0     0  330911  7.8292        Q
## 2     1     0  363272  7.0000        S
## 3     0     0  240276  9.6875        Q
## 4     0     0  315154  8.6625        S
## 5     1     1 3101298 12.2875        S
## 6     0     0    7538  9.2250        S

3.2 Subsetting data

prop.table(table(passenger_newnew$Survived == 1))
## 
##     FALSE      TRUE 
## 0.6161616 0.3838384
testF1 <- test_newnew[test_newnew$Pclass == 1 & test_newnew$Sex == "female",]

head(testF1)
##    PassengerId Pclass                                                    Name
## 13         904      1           Snyder, Mrs. John Pillsbury (Nelle Stevenson)
## 15         906      1 Chaffee, Mrs. Herbert Fuller (Carrie Constance Toogood)
## 23         914      1                    Flegenheim, Mrs. Alfred (Antoinette)
## 25         916      1         Ryerson, Mrs. Arthur Larned (Emily Maria Borie)
## 27         918      1                            Ostby, Miss. Helene Ragnhild
## 45         936      1        Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)
##       Sex Age SibSp Parch      Ticket     Fare Embarked
## 13 female  23     1     0       21228  82.2667        S
## 15 female  47     1     0 W.E.P. 5734  61.1750        S
## 23 female  27     0     0    PC 17598  31.6833        S
## 25 female  48     1     3    PC 17608 262.3750        C
## 27 female  22     0     1      113509  61.9792        C
## 45 female  45     1     0       11753  52.5542        S

For example, we would like to predict whether Snyder, Mrs. John Pillsbury (Nelle Stevenson), who was of age 23, survived the shipwreck by calculating the probability of a 23 year old woman surviving using a tree diagram and our past data.

surv23 <- survivors[survivors$Age == 23 & survivors$Sex == "female",]
head(surv23)
##     PassengerId Survived Pclass                                         Name
## 89           89        1      1                   Fortune, Miss. Mabel Helen
## 394         394        1      1                       Newell, Miss. Marjorie
## 474         474        1      2 Jerwan, Mrs. Amin S (Marie Marthe Thuillard)
## 650         650        1      3              Stanley, Miss. Amy Zillah Elsie
##        Sex Age SibSp Parch          Ticket     Fare Embarked
## 89  female  23     3     2           19950 263.0000        S
## 394 female  23     1     0           35273 113.2750        C
## 474 female  23     0     0 SC/AH Basle 541  13.7917        C
## 650 female  23     0     0        CA. 2314   7.5500        S
prop.table(table(surv23$Pclass))
## 
##    1    2    3 
## 0.50 0.25 0.25

This shows that among the 23 year olds who survive, half is from first class and 0.25 is from second and third class.

By drawing a tree diagram, the probability of a 23 year old in a first class to survive would be 0.383838 (from the probability of a passenger surviving) multiplied by 0.68128655 (from the probability of the survivors being women) multiplied by 0.016393443 (probability of the women survivor being of age 23) multiplied by 0.5 (probability of a women survivor of age 23 was in first class)

0.38383838*0.68128655*0.016393443*0.5
## [1] 0.002143475

Thus, the probability of Snyder, Mrs. John Pillsbury (Nelle Stevenson) surviving the shipwreck would be 0.002143475.

testM3 <- test_newnew[test_newnew$Pclass == 3 & test_newnew$Sex == "male",]

head(testM3)
##    PassengerId Pclass                       Name  Sex  Age SibSp Parch
## 1          892      3           Kelly, Mr. James male 34.5     0     0
## 4          895      3           Wirz, Mr. Albert male 27.0     0     0
## 6          897      3 Svensson, Mr. Johan Cervin male 14.0     0     0
## 10         901      3    Davies, Mr. John Samuel male 21.0     2     0
## 11         902      3           Ilieff, Mr. Ylio male 27.0     0     0
## 18         909      3          Assaf, Mr. Gerios male 21.0     0     0
##       Ticket    Fare Embarked
## 1     330911  7.8292        Q
## 4     315154  8.6625        S
## 6       7538  9.2250        S
## 10 A/4 48871 24.1500        S
## 11    349220  7.8958        S
## 18      2692  7.2250        C

Another example would be to predict whether Kelly, Mr. James of age 34.5 in third class would survive. For simplicity, we would round it up to 35.

surv35 <- survivors[survivors$Age == 35 & survivors$Sex == "male",]
head(surv35)
##     PassengerId Survived Pclass                             Name  Sex Age SibSp
## 605         605        1      1  Homer, Mr. Harry ("Mr E Haven") male  35     0
## 702         702        1      1 Silverthorne, Mr. Spencer Victor male  35     0
## 738         738        1      1           Lesurer, Mr. Gustave J male  35     0
##     Parch   Ticket     Fare Embarked
## 605     0   111426  26.5500        C
## 702     0 PC 17475  26.2875        S
## 738     0 PC 17755 512.3292        C
prop.table(table(surv35$Pclass))
## 
## 1 
## 1

According to our data, no man of age 35 in the third class survived the shipwreck. Thus it is most likely that Kelly, Mr Jones would not survive the shipwreck.

3.3 Overall Conclusion

First and foremost, there is in no way that the probability that is calculated is 100% fool proof and accurate as there may be many changing variables that may mess with the probability.

However, based on the probability that we calculated, the hypothesis that I put forth is that if the probability is greater than 0.00116515, it is likely that such a woman passenger (with the same age and class) survives whereas if the probability is less than 0.00116515, it is likely that such a woman passenger (with the same age and class) is unable to survive.

On the other hand, if the probability is greater than 0.0002652351, it is likely that such a man passenger (with the same age and class) survives whereas if the probability is less than 0.0002652351, it is likely that such a man passenger (with the same age and class) is unable to survive.

This hypothesis is calculated from

survivorsM <- survivors[survivors$Sex == "male",]
survivorsF <- survivors[survivors$Sex == "female",]
# Menghapus nilai frekuensi yang bernilai 0
agg6 <- agg6[-(50:65),]
mean(agg6$Freq)
## [1] 0.01354577
agg5 <- agg5[-(50:65),]
mean(agg5$Freq)
## [1] 0.006504356

Assuming that woman of all ages who survive has the same probability of 0.33333 of being in first class, second or third,

The hypothesis of the probability of a woman surviving is if the probability is greater than:

0.383838*0.68128655*0.01336675*0.333333
## [1] 0.00116515
#Probability of survival * probability of survivors being woman * mean probability of age of woman survivors * mean probability of each class

For men, assuming that man of all ages who survive has the same probability of 0.33333 of being in first class, second or third,

The hypothesis of the probability of a man surviving is if the probability is greater than:

0.383838*0.318713465*0.006504356*0.333333
## [1] 0.0002652351
#Probability of survival * probability of survivors being man * mean probability of age of man survivors * mean probability of each class